Analysis of interest rate¶
Normalization by historical interest-rate data
The database is updated to include data collected by the Federal Reserve on historical interest rates for 24-month personal loans.
The historical data is used to normalize LendingClub interest rates, giving information about how LendingClub rates changed independently of the changes occurring in the broader economy.
Spurious jumps in the normalized interest rates are avoided by interpolating the quarterly historical data to monthly data.
Correlation of interest rate with loan grade and term
The interest rate varies systematically with the loan grade.
For loans with a high grade, the interest rate varies systematically with loan sub-grade.
For loans with a low grade, the dependence of interest rate on sub-grade is complex.
There is a weak dependence of interest rate on loan term.
For loan grades of 'D' or higher, the mean rate for the 60-month loan is greater than or equal to the mean rate for the 36-month loan.
For the lower-grade loans, the mean rate for the 36-month loan is higher.
Correlation of interest rate with loan date
The historical interest rates for personal loans do not explain changes in LendingClub mean interest rates as a function of loan date.
For loans of grade 'D' or higher, the mean interest rate charged for loans of a given grade is essentially constant during the period covered by the dataset.
For lower-grade loans, the mean interest rate charged for loans of a particular grade tends to increase for later loan dates.
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import seaborn as sns
from IPython.display import display
from matplotlib.ticker import FuncFormatter, PercentFormatter
import notebook_tools.database as db
from notebook_tools.feature_exploration import get_group_sizes
from notebook_tools.plots import format_counts
sns.set_theme()
loan_data = db.get_loan_data()
loan_metadata = db.get_loan_metadata()
Normalization by historical interest-rate data¶
Since the 1970s, the Federal Reserve has collected data on interest rates for 24-month personal loans made by commercial banks. This data is published as part of the G.19 Statistical Release.
Although the LendingClub loans in our dataset have a term 36 months or 60 months rather than 24 months, this historical data is a measure of how the interest rates for personal loans in the broader US economy varied during the period in which the LendingClub loans were being issued.
Load and inspect the data¶
historical_data_path = Path.cwd().parent / "data" / "TERMCBPER24NS.csv"
historical_data = (
pd.read_csv(
historical_data_path,
na_values=".",
dtype={"DATE": "string", "TERMCBPER24NS": "float"},
)
.rename(columns={"DATE": "date", "TERMCBPER24NS": "int_rate"})
.assign(date=lambda df: pd.to_datetime(df["date"], format="ISO8601"))
.set_index("date")
)
display(historical_data)
| int_rate | |
|---|---|
| date | |
| 1972-02-01 | 12.50 |
| 1972-03-01 | NaN |
| 1972-04-01 | NaN |
| 1972-05-01 | 12.44 |
| 1972-06-01 | NaN |
| ... | ... |
| 2023-04-01 | NaN |
| 2023-05-01 | 11.48 |
| 2023-06-01 | NaN |
| 2023-07-01 | NaN |
| 2023-08-01 | 12.17 |
619 rows × 1 columns
plot = sns.relplot(historical_data, x="date", y="int_rate", aspect=2.5)
plot.set(
title="Average interest rate for 24-month personal loans by commercial banks"
).set_axis_labels(x_var="Date", y_var="Interest rate")
plot.facet_axis(0, 0).yaxis.set_major_formatter(PercentFormatter(decimals=0))
plt.show()
to_plot = get_group_sizes(loan_data, group_by="issue_d")
to_plot["issue_d"] = pd.to_datetime(to_plot["issue_d"], format="ISO8601")
plot = sns.relplot(to_plot, x="issue_d", y="count", aspect=2.5, kind="line", marker="o")
plot.set(title="Distribution of loan dates for LendingClub loans").set_axis_labels(
x_var="Loan date", y_var="Count"
)
ax = plot.facet_axis(0, 0)
ax.yaxis.set_major_formatter(FuncFormatter(format_counts))
plt.show()
Interpolate and filter¶
The LendingClub loans in our cleaned dataset were issued in the years 2012 - 2018, and
the loan date (issue_d) is given as a year-month combination.
In contrast, the historical data on interest rates includes a much broader time span but was only collected on a quarterly basis.
In this section, the historical data is interpolated and filtered, yielding estimated historical interest rates corresponding to the loan dates in our dataset.
Note that the historical data will be used to normalize the interest rate for LendingClub loans, and interpolation avoids spurious jumps in the normalized data.
bool_index = (historical_data.index >= pd.to_datetime("2012-01", format="ISO8601")) & (
historical_data.index <= pd.to_datetime("2018-12", format="ISO8601")
)
plot = sns.relplot(
historical_data[bool_index],
x="date",
y="int_rate",
aspect=2.5,
kind="line",
marker="o",
)
plot.set(
title=(
"Average interest rate for 24-month personal loans "
"by commercial banks, 2012 - 2018"
)
).set_axis_labels(x_var="Date", y_var="Interest rate")
plot.facet_axis(0, 0).yaxis.set_major_formatter(PercentFormatter(decimals=1))
plt.show()
historical_data = historical_data.interpolate(method="time")
historical_data = historical_data[bool_index]
plot = sns.relplot(
historical_data,
x="date",
y="int_rate",
aspect=2.5,
kind="line",
marker="o",
)
plot.set(
title=(
"Interpolated data: Average interest rate for 24-month personal loans "
"by commercial banks, 2012 - 2018"
)
).set_axis_labels(x_var="Date", y_var="Interest rate")
plot.facet_axis(0, 0).yaxis.set_major_formatter(PercentFormatter(decimals=1))
plt.show()
Add the historical data to the database¶
Convert the dates to strings before storing them in the database, since SQLite does not support a dedicated data type for dates.
historical_data = historical_data.reset_index().assign(
date=lambda df: df["date"].dt.strftime("%Y-%m")
)
if not db.table_exists("commercial_bank_interest_rates"):
db.add_tables({"commercial_bank_interest_rates": historical_data})
Normalize LendingClub interest rates¶
Set the normalization divisor to 1.0 for 2012-01.
historical_data = historical_data.assign(
divisor=historical_data["int_rate"] / historical_data.loc[0, "int_rate"]
).set_index("date")
historical_data
| int_rate | divisor | |
|---|---|---|
| date | ||
| 2012-01 | 10.711413 | 1.000000 |
| 2012-02 | 10.890000 | 1.016673 |
| 2012-03 | 10.906111 | 1.018177 |
| 2012-04 | 10.923333 | 1.019785 |
| 2012-05 | 10.940000 | 1.021341 |
| ... | ... | ... |
| 2018-08 | 10.080000 | 0.941052 |
| 2018-09 | 10.272065 | 0.958983 |
| 2018-10 | 10.457935 | 0.976336 |
| 2018-11 | 10.650000 | 0.994267 |
| 2018-12 | 10.555435 | 0.985438 |
84 rows × 2 columns
loan_data = loan_data.join(historical_data[["divisor"]], on="issue_d").assign(
normalized_int_rate=lambda df: df["int_rate"] / df["divisor"]
)
Correlation of interest rate with loan grade and term¶
In exploring how interest rate varies with loan grade and loan term, we will use the raw interest rates rather than normalized rates.
min = loan_data["int_rate"].min()
max = loan_data["int_rate"].max()
print(
'The minimum and maximum values of "int_rate" '
f"are {min} and {max}, respectively."
)
The minimum and maximum values of "int_rate" are 5.31 and 30.99, respectively.
# Define arrays / lists needed for binning the data and plotting histograms in plotly.
int_rate_bins = np.linspace(5, 31, num=27)
int_rate_bin_labels = [f"{left:d}% - {left+0.99:.2f}%" for left in range(5, 31)]
int_rate_tick_vals = int_rate_bin_labels[0::5]
int_rate_tick_text = [f"{left}%" for left in range(5, 35, 5)]
loan_data["int_rate_bin"] = pd.cut(
loan_data["int_rate"], bins=int_rate_bins, labels=int_rate_bin_labels, right=False
)
to_plot = get_group_sizes(loan_data, group_by="int_rate_bin")
fig = px.bar(
to_plot,
x="int_rate_bin",
y="count",
labels={"count": "Number of loans", "int_rate_bin": "Interest rate"},
title="Distribution of loan interest rate",
)
customdata = to_plot["int_rate_bin"]
hovertemplate = "Interest rate=%{customdata}<br>Number of loans=%{y:.3s}<extra></extra>"
fig.update_traces(customdata=customdata, hovertemplate=hovertemplate)
fig.update_layout(bargap=0)
fig.update_xaxes(
tickmode="array", tickvals=int_rate_tick_vals, ticktext=int_rate_tick_text
)
fig.show()
to_plot = get_group_sizes(loan_data, group_by=["int_rate_bin", "grade"])
fig = px.bar(
to_plot,
x="int_rate_bin",
y="count",
facet_row="grade",
labels={
"count": "Number of loans",
"int_rate_bin": "Interest rate",
"grade": "Grade",
},
title="Distribution of loan interest rate by loan grade",
hover_data={"count": ":.3s"},
height=1200,
)
fig.update_xaxes(tickmode="array", tickvals=int_rate_tick_vals, tickangle=45)
fig.update_yaxes(title="")
fig.update_layout(bargap=0, yaxis4_title="Number of Loans")
fig.show()
Show the same figure, but with a different vertical scale for each of the axes, in order to get a better view of the distribution of loan interest rate for loans with low grades.
fig.update_yaxes(matches=None)
fig.show()
to_plot = get_group_sizes(loan_data, group_by=["int_rate_bin", "grade", "sub_grade"])
to_plot["sub_grade"] = to_plot["sub_grade"].str[1]
fig = px.bar(
to_plot,
x="int_rate_bin",
y="count",
facet_row="grade",
color="sub_grade",
labels={
"count": "Number of loans",
"int_rate_bin": "Interest rate",
"grade": "Grade",
"sub_grade": "Sub-grade",
},
title="Distribution of loan interest rate by loan grade and sub-grade",
hover_data={"count": ":.3s"},
height=1200,
)
fig.update_xaxes(tickmode="array", tickvals=int_rate_tick_vals, tickangle=45)
fig.update_yaxes(matches=None, title="")
fig.update_layout(bargap=0, yaxis4_title="Number of Loans")
fig.show()
to_plot = get_group_sizes(
loan_data, group_by=["int_rate_bin", "grade", "sub_grade", "term"]
).assign(
sub_grade=lambda df: df["sub_grade"].str[1],
term=lambda df: df["term"].astype(str) + " months",
)
fig = px.bar(
to_plot,
x="int_rate_bin",
y="count",
facet_col="term",
facet_row="grade",
color="sub_grade",
labels={
"count": "Number of loans",
"int_rate_bin": "Interest rate",
"grade": "Grade",
"term": "Term",
"sub_grade": "Sub-grade",
},
title="Distribution of loan interest rate by loan grade, sub-grade, and term",
facet_col_spacing=0.06,
hover_data={"count": ":.3s"},
height=1200,
)
fig.update_xaxes(tickmode="array", tickvals=int_rate_tick_vals, tickangle=45)
fig.update_yaxes(matches=None, showticklabels=True, title="")
fig.update_layout(bargap=0, yaxis7_title="Number of Loans")
fig.show()
The previous plots show the following:
- The interest rate varies systematically with the loan grade.
- For loans with a high grade, the interest rate varies systematically with loan sub-grade.
- For loans with a low grade, the dependence of interest rate on sub-grade is complex.
- There isn't a strong dependence of interest rate on loan term.
To get a different view of these patterns, I'll set the y-axis to show percentage of loans rather than number of loans in each category.
to_plot = get_group_sizes(
loan_data, group_by=["int_rate_bin", "grade", "sub_grade", "term"]
).assign(
sub_grade=lambda df: df["sub_grade"].str[1],
term=lambda df: df["term"].astype(str) + " months",
)
fig = px.histogram(
to_plot,
x="int_rate_bin",
y="count",
facet_col="term",
facet_row="grade",
color="sub_grade",
barnorm="fraction",
labels={
"count": "Number of loans",
"int_rate_bin": "Interest rate",
"grade": "Grade",
"term": "Term",
"sub_grade": "Sub-grade",
},
title="Distribution of loan interest rate by loan grade, sub-grade, and term",
height=1200,
)
def clean_up_hovertemplate(trace):
trace.hovertemplate = trace.hovertemplate.replace(
"sum of Number of loans (normalized as fraction)", "Percentage"
)
fig.for_each_trace(clean_up_hovertemplate)
fig.update_xaxes(tickmode="array", tickvals=int_rate_tick_vals, tickangle=45)
fig.update_yaxes(tickformat=".2p", title_text="")
fig.update_layout(bargap=0, yaxis7_title="Percentage of loans")
fig.show()
The dependence of interest rate on loan term is not clearly visible in the plots above. To give a clearer view of this dependence, group the loans based on grade and term, and for each group, calculate the mean interest rate.
to_plot = (
loan_data[["int_rate", "grade", "term"]]
.assign(
term=loan_data["term"].map(lambda n: str(n) + " months"),
int_rate=loan_data["int_rate"] / 100,
)
.groupby(["grade", "term"])
.mean()
.reset_index()
)
fig = px.bar(
to_plot,
x="grade",
y="int_rate",
color="term",
barmode="group",
labels={"grade": "Grade", "int_rate": "Mean interest rate", "term": "Loan term"},
hover_data={"int_rate": ":.3p"},
title="Mean interest rate by loan grade and loan term",
)
fig.update_yaxes(tickformat=".0%")
fig.show()
- For loan grades of 'D' or higher, the rate for the 60-month loan is greater than or equal to the rate for the 36-month loan.
- For the lower-grade loans, the rate for the 36-month loan is higher.
Correlation of interest rate with loan date¶
to_plot = (
loan_data[["int_rate", "normalized_int_rate", "issue_d"]]
.groupby("issue_d")
.mean()
.reset_index()
.rename(columns={"int_rate": "raw", "normalized_int_rate": "normalized"})
.melt(id_vars="issue_d", var_name="type", value_name="int_rate")
.assign(int_rate=lambda df: df["int_rate"] / 100)
)
fig = px.line(
to_plot,
x="issue_d",
y="int_rate",
color="type",
facet_row="type",
markers=True,
labels={
"int_rate": "Mean interest rate",
"issue_d": "Loan date",
"type": "Interest-rate type",
},
hover_data={"int_rate": ":.3p"},
title="Mean interest rate by loan date",
height=400,
)
fig.for_each_annotation(
lambda ann: ann.update(text=ann.text.replace("Interest-rate type", "Type"))
)
fig.update_yaxes(tickformat=".2p", title="Interest rate")
fig.show()
Although there are distinct differences between the curves for raw and normalized interest rates, the two curves are qualitatively similar.
The historical interest rates do not explain the variation in the mean interest rate as a function of loan date.
The analysis earlier in this notebook suggests that loan grade is the dominant factor determining interest rate. Check the dependence of interest rate on loan date for loans of a particular grade.
to_plot = (
loan_data[["int_rate", "grade", "issue_d"]]
.groupby(["issue_d", "grade"])
.mean()
.reset_index()
.assign(int_rate=lambda df: df["int_rate"] / 100)
)
to_plot
fig = px.line(
to_plot,
x="issue_d",
y="int_rate",
color="grade",
facet_row="grade",
markers=True,
labels={
"int_rate": "Mean interest rate",
"issue_d": "Loan date",
"grade": "Loan grade",
},
hover_data={"int_rate": ":.3p"},
title="Mean interest rate by loan grade an loan date",
height=800,
)
fig.for_each_annotation(
lambda ann: ann.update(text=ann.text.replace("Loan grade", "Grade"))
)
fig.update_yaxes(tickformat=".2p", title="")
fig.update_layout(bargap=0, yaxis4_title="Interest rate")
fig.show()
For loans of grade 'D' or higher, the mean interest rate charged for loans of a given grade is essentially constant during the period covered by the dataset.
For lower-grade loans, the mean interest rate charged for loans of a particular grade tends to increase for later loan date.